Dimension Encoding for Bitwise Dimensional Co-Clustering
نویسندگان
چکیده
In this technical report we explain how to create skew-resistant balanced dimensions for our clustering scheme Bitwise Dimensional Co-Clustering (short BDCC) based on histograms and Hu-Tucker encoding. This is needed to avoid unreliable precision in BDCCscan when scanning tables at different granularities.
منابع مشابه
Predictive Overlapping Co-Clustering
In the past few years co-clustering has emerged as an important data mining tool for two way data analysis. Coclustering is more advantageous over traditional one dimensional clustering in many ways such as, ability to find highly correlated sub-groups of rows and columns. However, one of the overlooked benefits of co-clustering is that, it can be used to extract meaningful knowledge for variou...
متن کاملModel-based Co-clustering for High Dimensional Sparse Data
We propose a novel model based on the von Mises-Fisher (vMF) distribution for coclustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the m...
متن کاملClustering Algorithms For High Dimensional Data – A Survey Of Issues And Existing Approaches
Clustering is the most prominent data mining technique used for grouping the data into clusters based on distance measures. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of...
متن کاملClustering on a Subspace of Exponential Family Using Variational Bayes Method
The e-PCA has been proposed to reduce the dimension of the parameters of probability distributions using Kullback information as a distance between two distributions. It also provides a framework for dealing with various data types such as binary and integer for which the Gaussian assumption on the data distribution is inappropriate. In this paper, we introduce a latent variable model for the e...
متن کاملEfficient high dimension data clustering using constraint-partitioning k-means algorithm
With the ever-increasing size of data, clustering of large dimensional databases poses a demanding task that should satisfy both the requirements of the computation efficiency and result quality. In order to achieve both tasks, clustering of feature space rather than the original data space has received importance among the data mining researchers. Accordingly, we performed data clustering of h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012